A predictive model for post-thoracoscopic surgery pulmonary complications based on the PBNN algorithm

We constructed an early prediction model for postoperative pulmonary complications after thoracoscopic surgery using machine learning and deep learning algorithms. The artificial intelligence prediction models were built in Python, primarily using artificial intelligencealgorithms including both machine learning and deep learning algorithms. Correlation analysis showed that postoperative pulmonary complications were positively correlated with age and surgery duration, and negatively correlated with serum albumin. Using the light gradient boosting machine(LGBM) algorithm, weighted feature engineering revealed that single lung ventilation duration, history of smoking, surgery duration, ASA score, and blood glucose were the main factors associated with postoperative pulmonary complications. Results of artificial intelligence algorithms for predicting pulmonary complications after thoracoscopy in the test group: In terms of accuracy, the two best algorithms were Logistic Regression (0.831) and light gradient boosting machine(0.827); in terms of precision, the two best algorithms were Gradient Boosting (0.75) and light gradient boosting machine (0.742); in terms of recall, the three best algorithms were gaussian naive bayes (0.581), Logistic Regression (0.532), and pruning Bayesian neural network (0.516); in terms of F1 score, the two best algorithms were LogisticRegression (0.589) and pruning Bayesian neural network (0.566); and in terms of Area Under Curve(AUC), the two best algorithms were light gradient boosting machine(0.873) and pruning Bayesian neural network (0.869). The results of this study suggest that pruning Bayesian neural network (PBNN) can be used to assess the possibility of pulmonary complications after thoracoscopy, and to identify high-risk groups prior to surgery.

www.nature.com/scientificreports/their superiority over traditional methods in predicting patient prognosis in a variety of settings and disease conditions 15 .A significant advantage of machine learning techniques lies in their ability to produce more stable predictions by handling complex nonlinear relationships between predictive variables.For example, a recent study has suggested that a variety of AI algorithms can be used to construct prediction models for difficult intubations 16 .Additionally, it has been demonstrated that intelligent algorithms can be used to predict the likelihood of pulmonary complications after emergency gastrointestinal procedures 17 .Machine learning and deep learning techniques can also predict intraoperative bleeding in patients undergoing hepatectomy 18 .
The aim of this study was to construct an early prediction model for PPCs after thoracoscopic surgery using machine learning and deep learning algorithms.

Study population
In this study, patients who had undergone thoracoscopic surgery according to the public BioStudies medical database were analyzed.A total of 905 patients who had undergone thoracoscopic surgery were included.Exclusion criteria: emergency and trauma patients; age < 18 years; patients with preoperative pulmonary infection and/or pleural effusion; patients who had undergone open thoracotomy or whose surgery had been canceled; patients who had undergone a second operation; and patients whose relevant data were incomplete or missing.This study was approved by the Ethics Committee at the First Affiliated Hospital of Zhengzhou University (2020-KY-130).It was exempted from informed consent because it was a retrospective study.

Data collection
Demographic and clinical variables were collected from multiple patients who had undergone thoracoscopic surgery.General information included age, sex, American Society of Anesthesiologists (ASA) classification, body mass index (BMI), and history of hypertension, diabetes, stroke, heart disease, chronic obstructive pulmonary disease (COPD), smoking, and alcohol use.Preoperative laboratory tests included white blood cell, red blood cell, and platelet counts.Information on surgery and anesthetic management included surgery duration and single lung ventilation duration.The study focused on PPCs, which, according to the definition of European perioperative clinical outcomes, comprise respiratory infection, respiratory failure, pleural effusion, pulmonary atelectasis, pneumothorax, bronchospasm, and aspiration pneumonia 19,20 .In the event that one of these complications was detected within the first seven days after surgery, it was considered a PPC.

AI algorithms
The AI prediction models were built in Python, primarily using AI algorithms.These included both machine learning and deep learning algorithms, such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting Decision Tree-Gradient Boosting, Extreme gradient boosting-XGB, light gradient boosting machine-LGBM, Linear Support Vector-LinearSVC, Multilayer Perceptron Classifier-MLPC, Gaussian naive Bayes-gnb, K-nearest neighbors-knn, AdaBoost-adab, Convolutional Neural Network-CNN, Long Short Term Memory-LSTM, Convolutional Neural Network + Recurrent Neural Networks-CNNRNN, Convolutional Neural Network + Long Short Term Memory-CNNLSTM and Pruning Bayesian neural network-PBNN.The dataset was first divided into training and test groups at a ratio of 7:3.Then the AI algorithms were used to build prediction models for data in the training group with fivefold cross-validation.Next, the model's performance was verified in the test group.The LGBM algorithm was used to analyze and rank the weights of each variable accounting for the PPC.Person correlation was used to analyze the relationship between the individual variables.The data were then normalized.Any missing data were processed using the SimpleImputer package.The model was evaluated based on its ROC curve, accuracy, precision, F1 score, recall, Matthews correlation coefficient(MCC), Specificity and MSE score.

General statistical analysis
R was used to conduct general analysis.The count data were expressed as percentages, with a χ2 test for group comparisons.Any measurement data conforming to a normal distribution were expressed as x ± s , with a t-test for group comparisons.A P value less than 0.05 was considered significant. www.nature.com/scientificreports/

Results of AI algorithms for predicting pulmonary complications after thoracoscopy in the training group
In terms of accuracy, the two best algorithms were adab (0.915) and CNNRNN (0.912); in terms of precision, the best algorithm was RandomForest (0.944); in terms of recall, the two best algorithms were CNNRNN (0.752) and adab (0.745); in terms of F1 score, the best algorithm was CNNRNN (0.796); and in terms of AUC, the two best algorithms were CNNRNN (0.959) and RandomForest (0.918); MCC value greater than 0.6 includes the following algorithms: CNNRNN, adab, logistic regression, linear SVC, and PBNN; and except for the gnb algorithm, the specificity values of other algorithms are all greater than 0.900 (Table 2 and Fig. 3).

Results of AI algorithms for predicting pulmonary complications after thoracoscopy in the test group
In terms of accuracy, the two best algorithms were LogisticRegression (0.831) and LGBM (0.827); in terms of precision, the two best algorithms were GradientBoosting (0.75) and LGBM (0.742); in terms of recall, the three best algorithms were gnb (0.581), LogisticRegression (0.532), and PBNN (0.516); in terms of F1 score, the two best algorithms were LogisticRegression (0.589) and PBNN (0.566); and in terms of AUC, the two best algorithms were LGBM (0.873) and PBNN (0.869); the algorithms with the highest MCC value were logistic regression and PBNN; and except for the CNNLSTM, CNNRNN, and gnb algorithms, the specificity values of other algorithms are all greater than 0.900 (Table 3 and Fig. 4).Taken together, PBNN performed best among these AI algorithms in predicting post-thoracoscopic pulmonary complications.

Discussion
With its high incidence, PPCs associated with pneumonectomy are a major contributor to prolonged hospitalization, increased postoperative mortality, and medical costs 21,22 .In spite of the use of perioperative pulmonary protective ventilation strategies and minimally invasive thoracoscopic techniques, the incidence of PPCs remains between 12 and 50% 23 .As a result, preventing PPCs is crucial to the prognosis of post-thoracoscopic patients.In this study, PBNN were found to outperform other AI algorithms in predicting PPCs.
The weighted feature engineering constructed by the LGBM algorithm indicated that the main factors for developing pulmonary complications after thoracoscopy were single-lung ventilation duration, smoking history, surgery duration, ASA score, and blood glucose.The occurrence of PPCs has been shown to be closely related to preoperative interstitial pneumonia and smoking history 24 .A predictive risk model for PPCs can be constructed using age, smoking status, and postoperative 1-s forced expiratory volume 25 .PPCs are also associated with prolonged surgery times 26 .In multivariate analysis, the risk factors associated with increased prevalence of PPCs were ASA physical status ≥ III and surgery duration > 5 h 27 .Surgery duration, one-lung ventilation duration, and ASA score are significant predictors of PPCs after thoracic surgery 28 .The incidence of PPCs was 10.9% among the 6,063 patients who were analyzed, and factors such as advanced age, ASA score, and surgery duration ≥ 1 h were the main determinants of pulmonary complications 29 .PPCs have been reported to be significantly influenced by smoking, postoperative blood glucose, and ventilation duration in patients undergoing noncardiac surgery 30 .Additionally, diabetic patients have a higher risk of pulmonary complications during the perioperative period of coronary artery bypass surgery than do non-diabetics 31 .These conclusions are also supported by our findings.
This study does have its limitations.Firstly, it was a retrospective study conducted at a single center, and therefore it is subject to single-center bias.For internal validation, cross-validation were used; however, further multicenter and prospective studies are needed.Furthermore, this retrospective study did not include detailed information on intraoperative hemodynamic fluctuations, postoperative pain, or its treatment.This study's results suggest that AI algorithms such as PBNN can be used to assess the possibility of pulmonary complications after thoracoscopy, and to identify high-risk groups prior to surgery.Moreover, PBNN's accuracy

Figure 2 .
Figure 2. Variable importance of features included in machine learning algorithm.